Probability Density Grid-based Online Clustering for Uncertain Data Streams
نویسندگان
چکیده
Most existing stream clustering algorithms adopt the online component and offline component. The disadvantage of two-phase algorithms is that they can not generate the final clusters online and the accurate clustering results need to be got through the offline analysis. Furthermore, the clustering algorithms for uncertain data streams are incompetent to find clusters of arbitrary shapes according to the varieties of uncertain data streams. To address this issue, this paper proposes a novel algorithm PDG-OCUStream, Probability Density Grid-based Online Clustering for Uncertain Data Streams, in which the summary information of uncertain data streams is stored in the probability density grid with relative statistical values. By setting the probability density threshold, clustering quality can be effectively controlled, and probability density grid structure is easy to be maintained and updated, so it can improve the efficiency of online clustering. In this paper we also use the count-based sliding window, which reflects the current situation of the uncertain data stream. System resources can be effectively saved by adjusting the step of sliding window. In addition, this paper defines grid probability density similarity to achieve initializing and updating clusters according to merging connected probability density grids, so the algorithm can distinguish between dense regions and sparse regions, and quickly find the clusters in the data distribution in real time. The experimental results show that PDG-OCUStream algorithm has fast online clustering capability while ensuring a good clustering quality.
منابع مشابه
Adjustable Probability Density Grid-Based Clustering for Uncertain Data Streams
Most existing traditional grid-based clustering algorithms for uncertain data streams that used the fixed meshing method have the disadvantage of low clustering accuracy. In view of above deficiencies, this paper proposes a novel algorithm APDG-CUStream, Adjustable Probability Density Grid-based Clustering for Uncertain Data Streams, which adopts the online component and offline component. In o...
متن کاملDENGRIS-Stream: A Density-Grid based Clustering Algorithm for Evolving Data Streams over Sliding Window
Evolving data streams are ubiquitous. Various clustering algorithms have been developed to extract useful knowledge from evolving data streams in real time. Density-based clustering method has the ability to handle outliers and discover arbitrary shape clusters whereas grid-based clustering has high speed processing time. Sliding window is a widely used model for data stream mining due to its e...
متن کاملResearch on Clustering Algorithm Based on Grid Density on Uncertain Data Stream
To solve the clustering algorithm based on grid density on uncertain data stream in adjustment cycle for clustering omissions, the paper proposed an algorithm, named GCUDS, to cluster uncertain data steam using grid structure. The concept of the data trend degree was defined to describe the grade of a data point belonging to some grid unit and the defect of information loss around grid units wa...
متن کاملMuDi-Stream: A multi density clustering algorithm for evolving data stream
Density-based method has emerged as a worthwhile class for clustering data streams. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problem. There is a dramatic decrease in the quality of clustering when there is a range in density of data. In this paper, a new metho...
متن کاملClustering over High-Dimensional Data Streams Based on Grid Density and Effective Dimension
Clustering algorithm based on grid and density has many excellent features. But for the highdimensional data stream, the number of grids will be increased sharply as the space dimensionality grows. To solve the defect, we propose GDH-Stream, a clustering method based on the effective dimension and grid density for high-dimensional data stream, which consists of an online component and an offlin...
متن کامل